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Myogenic cell cultures derived from muscle biopsies are excellent models for human cell differentiation. We report the first 
comprehensive analysis of myogenesis-specific DNA hyper- and hypo-methylation throughout the genome for human 
muscle progenitor cells (both myoblasts and myotubes) and skeletal muscle tissue vs. 30 non-muscle samples using 
reduced representation bisulfite sequencing. We also focused on four genes with extensive hyper- or hypo-methylation 
in the muscle lineage [PAX3, TBXl, MYH7B/MIR499 and OBSCN) to compare DNA methylation, DNasel hypersensitivity, 
histone modification, and CTCF binding profiles. We found that myogenic hypermethylation was strongly associated 
with homeobox or T-box genes and muscle hypomethylation with contractile fiber genes. Nonetheless, there was no 
simple relationship between differential gene expression and myogenic differential methylation, rather only for subsets 
of these genes, such as contractile fiber genes. Skeletal muscle retained -30% of the hypomethylated sites but only ~3% 
of hypermethylated sites seen in myogenic progenitor cells. By enzymatic assays, skeletal muscle was 2-fold enriched 
globally in genomic 5-hydroxymethylcytosine (5-hmC) vs. myoblasts or myotubes and was the only sample type 
enriched in 5-hmC at tested myogenic hypermethylated sites in PAX3/CCDC140 and TBXh TETl and TET2 RNAs, which are 
involved in generation of 5-hmC and DNA demethylation, were strongly upregulated in myoblasts and myotubes. Our 
findings implicate de novo methylation predominantly before the myoblast stage and demethylation before and after 
the myotube stage in control of transcription and co-transcriptional RNA processing. They also suggest that, in muscle, 
TETl orTET2 are involved in active demethylation and in formation of stable 5-hmC residues. 



Introduction 

The importance of tissue-specific differences in human DNA 
methylation' is emerging with much greater clarity because of 
the recent availability of human methylome profiles^ and the 
recognition of 5-hydroxymethylcytosine (5-hmC) as the sixth 
genetically programmed base in mammalian DNA.' This modi- 
fied base appears to be a functional component of the epigenome 
as well as an intermediate in the conversion of certain genomic 5- 
methylcytosine (5-mC) residues to C residues."'* DNA methylome 
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analysis can be used to study differentiation- or disease-linked 
differences in human epigenetics from a wide range of tissues and 
cell types more readily than can chromatin epigenetic profiling. 
Moreover, it gives insights into chromatin epigenetics, with which 
it is often coupled, albeit sometimes in complex relationships.^ 

The differentiation of mononuclear myoblasts (Mb) to mul- 
tinuclear myotubes (Mt) provides an in vitro model for complex 
differentiation-linked changes in cellular physiology that are 
critical during embryogenesis and postnatal, regenerative muscle 
repair. The muscle lineage-specific hyper- and hypo-methylation 
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throughout the genome has not been examined, although there 
have been studies of promoter methylation in skeletal muscle^ '" 
or myoblasts vs. adipose cell-derived myocytes." Therefore, by 
reduced representation bisulfite sequencing (RRBS),'^ we pro- 
filed DNA methylation in untransformed Mb cultures and Mt 
samples derived from them as well as in 16 types of cell cultures 
established from normal human tissues in order to identify 
myogenesis-associated changes in DNA methylation. In addi- 
tion, we generated analogous profiles from skeletal muscle and 
diverse non-muscle tissues. The Mb preparations used for this 
study retain the ability to efficiently differentiate into Mt, and 
the Mb had the expected specificity of their expression profiles," 
indicating that they largely maintain their in vivo characteristics. 

RRBS has single-base resolution and detects CpGs in gene 
bodies as well as intergenic regions, CpG island (CGI) and non- 
island sequences, and both single-copy and repeated sequences, 
even though only -5% of genomic CpGs are detected by this pro- 
cedure. By sampling only a fraction of CpG sites, but including 
all the major classes of DNA sequences, RRBS enables compari- 
son of methylomes from many different samples. Because stan- 
dard bisulfite-based techniques to study DNA methylation, like 
RRBS, are generally unable to distinguish genomic 5-mC that is 
5-hmC''' from that which is just methylated, we also determined 
the total genomic 5-hmC content of myogenic progenitor cells, 
skeletal muscle, and several types of non-muscle cell cultures or 
tissues and differentiated 5-hmC from 5-mC in several key gene 
regions. With the major exception of brain, there is usually less 
than 5% as much 5-hmC as 5-mC in human tissues.''*'"' 

We report many novel findings about myogenesis- and muscle- 
associated epigenetics from our comparison of the methylomes 
of an unusually large number of various human samples (32 
cell cultures and 16 tissue samples) and from enzymatic assays 
for genomic 5-hmC. Hypermethylation in Mb and Mt showed 
a very strong association with genes encoding sequence-specific 
transcription factors, including homeodomain and T-box pro- 
teins. The vast majority of this hypermethylation was lost in skel- 
etal muscle. Hypomethylation in skeletal muscle had very strong 
associations with genes encoding muscle-specific proteins, and 
these were much stronger than those for hypomethylation in Mb 
and Mt. To better understand the biological context of myogenic 
hyper- and hypo-methylation, we examined in detail the loca- 
tions of differentially methylated (DM) sites and myogenesis- 
associated chromatin epigenetic marks (DNasel hypersensitiv- 
ity, histone modification and CTCF binding) for four genes 
that were associated with 20 to 131 DM sites. We also looked 
for correlations with expression of these genes that previously 
had no description of their DNA methylation status or none 
of their methylation outside the promoter region. Two of these 
genes displaying myogenic hypermethylation, PAX3 (a homeo- 
box gene) and TBXl (a T-box gene), encode transcription factors 
involved in various developmental pathways, including myogen- 
esis.'^'" PAX3 is associated with the craniofacial-deafness-hand 
syndrome, the Waardenburg syndrome, and rhabdomyosar- 
coma and TBXl with the DiGeorge 22qll.2 deletion syndrome 
and conotruncal heart defects. '^'''''•^'' In tested myogenic hyper- 
methylated sites associated with these genes, we found that 



skeletal muscle was the only sample enriched in 5-hmC. The 
other two highlighted genes, MYH7B/MIR499 and OBSCN, 
which exhibited mostly muscle-lineage hypomethylation, are 
important in the formation of skeletal muscle as well as in cer- 
tain other types of organogenesis and are implicated in cardio- 
myopathy.-''^^ Our study reveals the different timing of de novo 
methylation and demethylation during muscle formation, indi- 
cates the importance of differentially methylated sites in centrally 
located exons and introns in the muscle lineage, and provides 
new insights into epigenetic changes in genes that are clinically 
or developmentally important. 

Results 

Hyper- and hypo-methylation in myogenic progenitors and 
muscle vs. non-muscle samples. We compared RRBS DNA 
methylation profiles from immunocytochemically character- 
ized skeletal muscle progenitor cell cultures (nine Mb and Mt 
preparations) with those of 15 different types of non-transformed 
human cell strains plus Epstein-Barr virus-transformed lympho- 
blastoid cell lines (LCLs; Table SI). For each sample type, 2 to 
8 biological or technical replicates were included. The Mb cul- 
tures (-70% confluent) were derived from quadriceps biopsies of 
two control individuals, (27 y, F, and 46 y, M), one patient with 
inclusion body myositis (IBM; 74 y, F), and gastrocnemius or 
quadriceps from two genetically confirmed FSHD (an autosomal 
dominant disease) patients (29 y, M, and 14 y, F). Mb cultures 
were differentiated to Mt samples (> 70% of nuclei in multinucle- 
ated, desmin-positive cells) by serum limitation. The inclusion 
of the IBM Mb sample provided a disease control for FSHD 
myogenic progenitor cells and a relatively rare myoblast sample 
from an older individual for comparison to the skeletal muscle 
samples from individuals of similarly advanced ages. Our previ- 
ous expression profiling on exon-based microarrays of normal- 
control, IBM, and FSHD Mb and Mt samples'^ indicated that 
these samples had mostly similar, myogenesis-specific expression 
profiles, although there were significant differences between the 
FSHD and control myogenic samples. However, these were usu- 
ally only in the extent of myogenesis-associated changes in gene 
expression. 

The number of RRBS sites per sample ranged from about 
1 to 2 million. Approximately 50% of these sites should overlap 
CGIs.'^ Some of the sites appearing to be 5-mC in RRBS data sets 
might be 5-hmC; however, in most cell types other than brain, 
the amount of C methylation vastly outnumbers C hydroxy- 
methylation.''' Therefore, we use the term "DNA methylation" to 
include both 5-mC and 5-hmC. We compared the Mb and Mt 
samples as a group (MbMt) to the non-muscle cell types because 
the differences between Mb and Mt were very much less than 
between these samples and non-muscle cell cultures. Combining 
them greatly increased our statistical power for detecting myo- 
genic DM sites. Stringent criteria were set for determining DM 
sites in the MbMt data set vs. the other cell cultures, namely, at 
least a 50% difference in methylation in the myogenic cells vs. 
the other cell cultures at a significance level of p < 0.01, as deter- 
mined by fitted binomial regression models at each monitored 
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CpG site. With these criteria, 19,640 sites (1.7% of those assayed 
that had sufficient coverage) exhibited significant myogenesis- 
associated methylation. There was strong overrepresentation of 
myogenesis-relevant functional group terms for nearby genes, as 
described below. In addition, we examined RRBS profiles of two 
skeletal muscle samples (referred to as "muscle"; 83 y, F, and 71 y, 
M, with technical duplicates) vs. those from 12 different types of 
normal, non-muscle tissues plus two types of short-term, primary 
cell cultures (pancreatic islets and hepatocytes) . 

Of the total MbMt-associated DM sites, 9,592 (49%) were 
hypermethylated vs. non-muscle cell cultures and 10,048 (51%) 
were hypomethylated (Fig. lA). In contrast, the distribution 
of the 12,016 DM sites in skeletal muscle vs. other tissues was 
highly skewed, with 761 sites (6%) hypermethylated and 11,255 
sites (94%) hypomethylated. We mapped DM sites to the clos- 
est gene and then determined the overlap of genes that exhibited 
muscle and MbMt differential methylation. For this analysis we 
used genes containing at least two DM sites that were hypo- 
methylated or two that were hypermethylated; genes with both 
hypo- and hyper-methylated DM sites were designated according 
to which type of DM site was in the majority. Although 91% of 
the genes with hypermethylation at the Mb and Mt stages did 
not display hypermethylation in muscle, many of the genes that 
were hypermethylated in muscle were also hypermethylated in 
Mb and Mt samples (74 of the 89 genes, 83%; Fig. IB). This 
indicates that muscle hypermethylation consisted mostly of 
the small percentage of retained MbMt hypermethylation. In 
contrast, only 47% of the genes that were hypomethylated in 
muscle were also hypomethylated in the MbMt group (885 of 
1,876 genes), indicating that much of the muscle hypomethyl- 
ation was established after the Mt stage. Another difference 
between myogenic hyper- and hypo-methylation is that a high 
density of MbMt DM sites was more prevalent among the hyper- 
methylated genes (Fig. IC vs. D) and hypermethylated sites were 
much more frequently arrayed within 5 kb of the transcription 
start site (TSS) than were hypomethylated sites (Fig. IE and F, 
blue bars). That muscle tissue retained an even tighter distribu- 
tion of hypermethylated sites around the TSS than did Mb and 
Mt (Fig. SI) suggests the importance of this hypermethylation 
in repressing or reinforcing repression of many homeobox genes 
selectively in the muscle lineage. 

Functional and regional correlates of myogenic hypo- and 
hyper-methylation. To test for preferential distribution of DM 
sites in different subgenic regions, we mapped all the DM sites 
to the following regions with respect to a single RefSeq isoform 
for each gene: 2 kb upstream of the TSS (-2 kb) through the first 
intron, the last exon plus 2 kb downstream, the internal exons, 
the internal introns, and intergenic regions > 2 kb from the gene 
ends. About 34.5 and 45.5% of the MbMt and muscle DM sites 
were in internal exons or introns, respectively (Table S2B). When 
the ratio of the number of internal-exon to internal-intron DM 
sites was normalized for the average size of the exons and introns 
in these genes, there appeared to be a strong bias for myogenic 
differential methylation in exons (Table S2C). However, when 
the number of RRBS-detected CpG sites was used for the nor- 
malization, no genome-wide bias toward exons vs. introns was 



seen because RRBS had a strong preference for detecting CpGs 
in exons, which is likely to be of biological significance.^'' 

Genes were then divided into those with mostly myogenic 
hyper- or hypo-methylation and examined with the Genomic 
Regions Enrichment of Annotations Tool (GREAT^'). This 
program uses proximity and functional data to map small epi- 
genetically defined subregions or sites to the TSS of one or more 
nearby genes. The strongest associations between myogenic DM 
sites and functional terms for nearby genes were for MbMt- 
hypermethylated sites and genes encoding sequence-specific DNA 
binding proteins, in general, or homeobox and T-box transcrip- 
tion factors, in particular (Table S2A). There were 1,671 and 351 
sites, respectively, in these functional groupings (FDR Q-values 
< 10'^"°). Because RRBS favors CGIs, we checked whether 
these associations were due to such a bias. This was not the case 
because upon analysis of only the myogenic DM sites that did 
not overlap a CGI, the above functional terms were still highly 
significantly overrepresented among MbMt-hypermethylated 
sites (data not shown). Moreover, there was a strong associa- 
tion of histone genes with CGIs but no association of myogenic 
hypermethylation with histone genes. In addition, T-box genes 
were not associated with CGIs despite their strong association 
with MbMt-hypermethylated DM sites. Furthermore, the asso- 
ciation of myogenic hypermethylation with homeobox genes 
was much stronger than that for CGIs. The highly significant 
associations of the promoters of muscle hypomethylated, MbMt- 
hypermethylated, or MbMt-hypomethylated genes with binding 
motifs for the myogenic transcription factors MYOG, MYODl, 
or the E-protein heterodimer E12/E47 (432, 740 and 528 sites, 
respectively) attests to the muscle-relevance of the myogenic DM 
sites (Table S2A). 

For MbMt-hypomethylated sites, muscle-protein terms were 
not among the most significantly overrepresented GO catego- 
ries (Table S2A) even though they were for genes displaying Mt 
overexpression.'^ Instead, small GTPase regulator activity (660 
sites) and other signaling terms as well as focal adhesion were 
the most strongly associated with hypomethylation. Muscle- 
hypomethylated sites exhibited the strongest associations with 
several GO terms related to muscle constituents, namely, con- 
tractile fiber (475 sites) and actin cytoskeleton (856 sites). 

To look for associations between myogenic differential 
methylation and expression, we mined data from our previous 
exon-based expression microarray study of Mb or Mt samples 
vs. 19 types of non-muscle cell cultures. Despite often strong 
upregulation of expression at the Mb or Mt stages, 9 of 32 con- 
tractile-fiber genes (including OBSCN, see below) were heavily 
methylated in Mb or Mt at sites that displayed hypomethylation 
in muscle (Table S3). This suggests that DNA hypomethylation 
was often a consequence of the initiation of expression rather 
than its precedent, although it might thereby reinforce expres- 
sion. However, this was not always the case, e.g., the sarcoplasmic 
protein-encoding CASQl gene exhibited little expression in Mb 
and much more in Mt (Table S3) but had an intragenic site that 
was hypomethylated in all Mb as well as Mt samples. 

For many homeobox genes with MbMt hyper- 
methylation, there was the expected inverse correlation between 
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Figure 1. Distribution of myogenic hyper- and hypo-methylated sites. Genome-wide distribution of sites that were differentially methylated (DM) 
in myoblasts and myotubes vs. non-muscle cell cultures or in skeletal muscle vs. non-muscle tissues are shown. (A) Numbers of DIVI sites. (B) Venn 
diagram with overlap of genes associates with > 1 DM site using the majority type of DM site associated with a gene for its hyper- or hypomethylated 
classification. (C and D) Number of hyper- and hypo-methylated sites per associated gene as determined by closestBed analysis. (E and F) Distance of 
from the associated gene's transcription start site (TSS) of all muscle-hypomethylated or MbMt-hypermethylated CpG sites (blue bars) or just DM sites 
from genes encoding contractile fiber proteins or genes with a conserved homeobox site as determined by GREAT analysis. 



the hypermethylation and expression in Mb and Mt deter- 
mined by expression microarray profiling. Of 68 MbMt- 
hypermethylated homeobox genes represented on the expression 
microarray, 30 were significantly downregulated, 7 significantly 
upregulated, and 31 displayed no significant difference in 



expression in myogenic vs. non-myogenic cells, often with low 
levels of RNA in most cell types (data not shown). There was 
enrichment within 5 kb of the TSS of MbMt-hypermethylated 
sites in homeobox genes, which was greater than that for muscle- 
hypomethylated sites in contractile fiber genes (Fig. IE and F, 
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Table 1. Most of the myogenesis-associated differential methylation does not display a simple correlation with expression levels of the closest gene' 



Number of genes with significant expression differences in Mb or IVIt vs. non-myogenic cell cultures | 


Gene sets 


Upreg > 2x 


Dnreg > 2x No change in expression in myogenic cells 


Genes with > 1 hypermeth site 


84 


64 128 J 


Genes with > 1 hypometh site 


174 


106 307 


All genes 


2076 


1473 5265 J 


Ratio of hyper to hypo-meth genes 


0.48 


0.60 0.42 



"Hyper- or hypo-methylated sites in myogenic progenitor cells were mapped to genes by closestBed analysis. The number of genes with more than a 
2-fold change in expression refers to genes with significant (adjusted p < 0.01) up- or down-regulation of either Mb or Mt vs. 19 non-muscle cultures as 
determined with an exon expression microarray. 



green bars). These results suggest that myogenic hypermeth- 
ylation is helping to downregulate expression of many of the 
homeobox genes from the extended promoter region in myogenic 
vs. non-myogenic cells while for others it may be just reinforcing 
repressive chromatin marks. 

Three of the six T-box genes with MbMt hypermethylation 
are members of the TBXl subfamily of TBX genes. These 
three genes were strongly upregulated in myogenic progenitor 
cells despite having hypermethylated sites within 3 kb upstream 
{TBX15, TBX18 and TBXl). Two of them {TBX15 and TBXl) 
had > 20 hypermethylated sites in the first exon or intron 
(Table S3). TBXl exhibited the second highest number of MbMt- 
hypermethylated sites per gene (131), and SIM2, the highest 
number (178). SIM2 is a candidate gene for Down syndrome 
and, like TBXl, encodes a master regulator transcription factor 
whose developmental roles include those in skeletal muscle. 
Like TBXl, SIM2 had many MbMt-hypermethylated sites in the 
extended promoter region and specific expression in Mb samples 
(data not shown). Moreover, no overall relationship was seen 
between myogenesis-specific expression and myogenic hyper- or 
hypo-methylation when all genes with more than 1 MbMt DM 
site were considered (Table 1). Whether muscle-hypomethylated 
genes with more than 1 hypomethylated site or those with at least 
10 were considered, only about 25% of these genes were associ- 
ated with a muscle functional term (muscle phenotype, myopa- 
thy, or at least the 90th percentile expression in muscle; www. 
genecards.org, GeneDecks, data not shown). Therefore, only cer- 
tain genes or small subgroups of genes, like homeobox genes in 
Mb and Mt, displayed such associations. 

DNA methylation in myotubes vs. myoblasts and FSHD vs. 
control myogenic cells. Comparison of Mb and Mt samples to 
each other (three each, excluding the FSHD samples) indicated 
less than 5% as much differential methylation between these 
two myogenic progenitor cell types as for comparisons of myo- 
genic cells or tissues with non-myogenic samples. In this part of 
the study, we considered only autosomal DM sites because the 
set of Mb vs. Mt DM sites was enriched in X-linked sites (12% 
of all DM sites vs. 2% for all other classes of DM sites), probably 
as an artifact of the small sample size and different genders in 
the Mb vs. Mt comparison. Only 583 autosomal sites exhibited 
significant differential methylation in Mt vs. Mb compared with 
19,460 for MbMt vs. other cell cultures and 11881 for muscle vs. 
other tissues (Table S2). Moreover, only 11% of the genes with 
Mt vs. Mb differential methylation had at least two hypo- or 



two hyper-methylated DM sites compared with 63 and 59% of 
genes with differential methylation for MbMt vs. other cell cul- 
tures or muscle vs. other tissues, respectively (Fig. IC and D). 
If Mt samples were displaying some of the strong trend toward 
the loss of myogenic hypermethylated sites seen in muscle vs. 
Mb-plus-Mt, the ratio of hypo- to hyper-methylated sites for Mt 
vs. -Mb should have been intermediate to that for the other two 
comparisons (1.05 and 14.8, respectively; Fig. lA); but, instead, 
it was smaller than both (0.4). These results indicate that most 
of the relatively small number of sites with Mt-vs.-Mb differen- 
tial methylation (3% the number of MbMt vs. non-muscle cell 
DM sites) is due to sample variation. Although the small sample 
size (three Mb vs. three Mt samples) may have limited the sen- 
sitivity of the comparison to detect infrequent Mt vs. Mb differ- 
ential methylation, it is clear that there was much more extensive 
differential methylation between Mt or Mb muscle progenitor 
cells and skeletal muscle than between the Mb and Mt stages 
(Fig. IB). 

Because FSHD Mb and Mt samples were shown to have highly 
significant dysregulation of expression of some of their genes rela- 
tive to controls,"'^* we compared their DNA methylation pro- 
files (four FSHD to five normal controls and one disease-control; 
Fig. IC and D). There were 1348 autosomal DM sites, which 
was only 7—11% of the DM sites for the comparisons of myogenic 
progenitors to non-muscle cells or muscle to non-muscle tissues 
(Table S2). That the number of hypo- vs. hyper-methylated sites 
was so skewed (84% of these FSHD vs. control DM sites was 
hypomethylated) suggests biological relevance. The percent- 
age of genes associated with at least two hypo- or two hyper- 
methylated sites in FSHD vs. control myogenic progenitors was 
24%, twice as much as for the Mb vs. Mt comparison but much 
less than for myogenic-non-myogenic comparisons (Fig. IC 
and D). There was significant enrichment for homeobox genes 
but this enrichment was small compared with myogenic-non- 
myogenic comparisons (Table 52). However, as described below, 
an independent test of two FSHD-hypomethylated sites associ- 
ated with a homeobox gene did not show significant FSHD dif- 
ferential methylation. 

Total genomic 5-hmC in muscle vs. myogenic progenitors. 
Upon mining data from our previous expression microarray pro- 
filing,'' we found significant increases in steady-state levels of 
RNA encoding the 5-hmC-generatingTETl andTET2 enzymes 
in control Mb or Mt vs. the 19 other cell types (p = 3 x 10 '' 
to < 1 X lO-"^) but not for TET3. TETl and TET2 RNA levels in 
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Figure 2. Myogenic hypermethylation at PAX3 and the adjacent 
CCDC140 gene region. From the UCSC Genome Browser (http://genome. 
ucsc.edu), the following tracks are shown for the PAX3 and CCDC140 
gene region (chr2: 223,058,677-223,174,523; all coordinates are relative 
to hg19): RefSeq genes, RNA-seq data (not strand-specific; CalTech); 
DNase-seq data (this study); and representative RRBS data (this study). 
The cell strains and tissues illustrated are as follows: IBM IVlb, Ctl Mb3, Ctl 
Mt3, Ctl Mb7, Ctl Mt7, FSH Mb5, FSH Mt5, FSH Mb8, FSH MtS, LCL, Fibi, 
Fib2, Fib3, HiVlEC, SAEC, ESC, Muscle A1, Muscle A2, Muscle B1, Muscle 
B2, leukocyte, skin, brain and lung. The short horizontal black bars for 
the RRBS tracks indicate clusters of CpG hypermethylation in Mb, Mt, 
muscle, and Fib2 and Fib3. 



these myogenic progenitor cells were higher than those of all the 
other nontransformed cell types analyzed, with the exception 
of HI embryonal stem cells (ESC), which had very high lev- 
els of TETl RNA, as expected.'* The analyzed non-transformed 
cell types were skin fibroblasts, astrocytes, hepatocytes, human 
umbilical cord endothelial cells (HUVEC), human mammary 
epithelial cells (HMEC), melanocytes, normal human epi- 
dermal keratinocytes (NHEK), and osteoblasts in addition to 
Mb, Mt, and ESC. The ratio of the average TETl RNA signals 
for Mb vs. the average signal from the nontransformed, non- 
ESC cell types studied was 6.5, and for Mt, 2.8. The respective 
increases for TETl were 3.6 and 3.4. 

Given the evidence that 5-hmC residues in DNA can function 
as an intermediate in the conversion of 5-mC to unmethylated 
C residues,^' we analyzed relative levels of genomic 5-hmC in 
an enzymatic assay for the transfer of glucose to 5-hmC, with 
nonglucosylated phage T4 DNA as a standard."" Skeletal muscle 
DNA had an average 5-hmC level of 0.077 mol%, which was 
about twice that of control or FSHD myoblasts or myotubes 
(p < 10 ''; Table 54). The overall genomic 5-hmC contents of the 
Mb, Mt, skin fibroblast, and sperm samples and the amniocytes 



and chorionic villus-derived cells at passage 2 to 3 were simi- 
lar (0.030 to 0.043 mol%). As expected,^"^ cerebellum DNA 
had very much higher 5-hmC levels than the other samples 
(0.57 mol%). The decrease in 5-hmC levels in DNA from leu- 
kocytes to an Epstein-Barr virus-transformed B-cell line (lym- 
phoblastoid cell line, LCL) could be due to the effects of in vitro 
oncogenic transformation." 

Myogenic hypermethylation profiles in genes expressed in 
myoblasts: PAX3 and TBXl. Like many other homeobox or 
T-box transcription factor-encoding genes, PAX3 and TBXl, 
were hypermethylated in Mb and Mt. Their DNA and chromatin 
epigenetics were examined in detail to gain insight into the prob- 
able functional significance of myogenesis-associated epigenetic 
marks. These two genes have important roles in muscle or limb 
formation and other types of organogenesis and generate alter- 
natively spliced RNAs encoding different protein isoforms.^* '"'^' 
In addition to the 28 MbMt-hypermethylated sites in the gene 
body of PAX3, there were 58 MbMt-hypermethylated sites in 
the divergently transcribed, partly overlapping CCDC140 gene 
or CCDCl40's downstream region. Both genes were weakly, but 
selectively, transcribed in Mb (Fig. 2; Fig. S2A). The myogenic 
hypermethylated sites within CCDC140 were mostly in DNA 
sequences conserved between rodents and humans, although this 
gene of unknown function is primate-specific. The CCDC140- 
associated sites with MbMt hypermethylation were L8— 9.2 kb 
upstream to the PAX3 TSS. Nine MbMt-hypermethylated sites 
within the single intron of CCDC140 were retained in muscle. 
These results suggest that, in the muscle lineage, CCDC140 
intragenic sequences act in cis as PAX3 transcription regulatory 
elements, in addition to any independent gene functionality that 
they might have. Consistent with this interpretation, there are 
neural-specific and hypaxial somite enhancers within the mouse 
region similar to the single intron of CCDC140 and immediately 
downstream of the CCDCl40-XiV& region in addition to a second 
neural enhancer in intron 4 of Pax3.^''^^''' Throughout the PAX3/ 
CCDC140 region, there was a significant, but low, enrichment in 
H3K9me3 in Mb and Mt (Fig. S2A). At the bidirectional pro- 
moter region there were clusters of MbMt-hypermethylated sites 
on either side of an unmethylated CGI and a DNasel hypersensi- 
tive site (DHS) that was specific for Mb, Mt, and melanocytes (Fig. 
2, pink boxes). We used the Cufflinks tool for transcript analy- 
sis of RNA-seq data''' to quantify RNA-seq-determined expres- 
sion levels (ENCODE/CalTecy of different RNA isoforms. 
We found that there were low concentrations of various PAX3 
RNA isoforms (Table 55), including several described in human 
skeletal muscle.'^ Consistent with PAX3's, role in adult melano- 
cyte stem cell differentiation,"* it displayed DM sites, namely, 
a cluster of CpG sites with DNA methylation preferentially in 
melanocytes that was downstream of CCDC140 (Fig. 52b). 
This region partially overlapped sites enriched in methylation in 
Mb. 

TBXl, the other transcription factor gene examined in detail, 
had five times more hypermethylated sites in the MbMt set (131) 
than in skeletal muscle (26 ; Fig. 3 ; Fig. S3A) . It was upregulated 
in both progenitor cells (Table 55) and skeletal muscle tissue.'^''* 
Cortese et af had previously studied methylation just in the 
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promoter region at the tissue level and found hypermethylation 
in skeletal muscle vs. heart muscle or liver. Consistent with prefer- 
ential expression of TBXl in Mb and Mt (Fig. 3), these myogenic 
progenitor cells had a broad band of H3K36me3 enrichment 
(Fig. S3B) not seen in the other examined cell types except, to a 
lesser extent, in HUVEC, which also expressed this gene (Table 
S5). H3K36me3 is typically found downstream of the 5' end of 
active genes.'*" Remarkably, a gap in this band of H3K36me3 
signal was centered over a cluster of 79 MbMt-hypermethylated 
sites (Fig. 3; Fig. S3B) overlapping a CGI and exons 7-9. 

Besides the large clusters of MbMt-hypermethylated DM 
sites in the interior of TBXl, there were hypermethylated sites 
in intron 1, including a triplet of MbMt- and muscle-hyper- 
methylated sites only -100 bp downstream of the TSS (Fig. 3). 
Surprisingly, in the extended promoter region, there was no 
peak of H3K4 methylation in Mb, Mt, or HUVEC nor was 
there a DHS, as typically seen in active or poised promoters.'" 
Nonetheless, analysis of RNA-seq profiles (ENCODE/CalTech) 
indicated a predominant, full-length canonical transcript in Mb 
and HUVEC (2082 nt; NM_080647, TBXlc) starting at the 
single canonical promoter region and ending at the cluster of 
79 MbMt-hypermethylated sites (Table S5). Instead of the con- 
ventional peak of H3K4me3 at an active promoter region there 
was only a broad and strong H3K4me3 peak centered over intron 
3 in Mb, Mt, and HUVEC, the only expressing cell types. It 
was surrounded on either side by MbMt-hypermethylated sites. 
In muscle, this peak was significantly hypomethylated at CpGs 
vs. non-muscle tissues (Fig. 3, blue box). In Mb and Mt, this 
same region was mostly unmethylated even though the CpG 
sites did not reach the level of significant differences from the 
non-muscle cell cultures. From this region, two minor tran- 
scripts seemed to originate that had retained intronic sequences 
(Table S5). Examination of small transcript (20-200 nt) RNA- 
seq data (ENCODE/Cold Spring Harbor) did not indicate any 
small intragenic ncRNA originating in this internal promoter- 
like chromatin. 

In the canonical, 5' promoter region of TBXl, there was a clus- 
ter of MbMt- and muscle-hypermethylated sites at -1.2 kb (Fig. 3, 
blue arrow). At -2.6 kb, there was a single MbMt- and muscle- 
hypomethylated site (Fig. 3) that is near a subregion of Mb-specific 
AS transcription (Fig. S3A). The only non-muscle cell culture 
that was mostly unmethylated at this -2.6 kb site was choroid 
plexus epithelial cells, a finding that might be relevant to the pro- 
posed neurological functioning of TBXl}^ Yet further upstream 
at -11 kb, there was a muscle- and MbMt-hypomethylated CpG 
site near a Mb- and Mt-specific DHS and myogenic enhancer- 
like peaks of H3K4Mel and H3K27Ac (Fig. 3, purple boxes). 
Another type of epigenetic mark that distinguished Mb and Mt 
was the differential presence of a binding site for the DNA bind- 
ing protein CTCF, which helps organize chromatin in cis.*^ From 
CTCF ChlP-seq (ENCODE/Broad), only Mb and Mt lacked 
CTCF binding sites in the TSS region and at the 3' end of the 
TBXlc gene isoform (Fig. 3; Fig. S3B). Therefore, the MbMt- 
hypermethylated subregions of TBXl at the beginning and end 
of the TBXlc isoform indicated in Figure 3 might be linked to 
alterations in the gene's higher-order structure via CTCF. As 



Figure 3. Predominant myogenic hypermetPiylation at TBXl, including 
the promoter region, despite high expression. TBXl (chr22: 19,732,116- 
19,771,300) epigenetic tracl<s from the UCSC Genome Browser are 
illustrated as in Figure 2 with the additions of tracks for sites exhibiting 
significant myogenic hypo- or hypermethylation (this study); histone 
H3 modifications (Broad); and CTCF ChlP-seq profiles (Broad). A cluster 
of MbiVlt-hypermethylated sites is shown in an expanded view (chr22: 
19,744,510-19,744,710) at the bottom of the figure with the following 
representative cell types: Ctl Mb3, Ctl Mt3, Ctl Mb7, Ctl Mt7, LCL, Fibi, 
osteoblasts, HMEC, IIVlR-90, small airway epithelial cells (SA epith), and 
ESC for sequencing reads of > 5. 



expected from the high level of expression of this gene in myo- 
genic cells, including skeletal muscle tissue, Mb and Mt did not 
have the broad band of repression-associated H3K27Me3 and 
EZH2 binding'''' across the upstream and 5' end of the gene that 
was seen in non-myogenic cells (Fig. S3B). 

Assay for 5-hmC vs. 5-mC at specific sites in TBXl and 
PAX3I CCDC140 regions. We used an enzymatic assay to quan- 
tify 5-hmC and 5-mC at CCGG sites (Epimark, New England 
Biolabs) overlapping three MbMt-hypermethylated CGs associ- 
ated with PAX3 or TBXl. Like the above-described enzymatic 
assay for total genomic 5-hmC, this site-specific assay involves 
incubation with T4 (3-glucosyltransferase. After incubation or a 
mock-enzyme incubation, aliquots are digested with Mspl, which 
can cleave CCGG sites whether or not they are methylated or 
hydroxymethylated at the internal C residue but not if they con- 
tain glucosylated 5-hmC. In parallel, digestions are done with 
Hpall, which can cleave CCGG sites only if they are unmodified 
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Figure 4. Only skeletal muscle had considerable 5-hmC at assayed CpG sites associated with TBXl or CCDC140/PAX3. Each of the indicated samples was 
assayed with or without incubation with T4 (J-giucosyitransferase or mock treatment, followed by Mspl, Hpall or mock incubation, and then quantita- 
tive PCR to determine 5-hmC and 5-mC levels at the assayed sites. 



at the internal C, and other aUquots are used as uncut controls. 
Upon real-time PCR of all of these mixtures, 5-hmC and 5-mC 
in the middle of the analyzed Mspl site is quantified by subtrac- 
tion of the threshold cycles (Ct). 

The first Mspl site that we examined was 7.9 kb upstream of 
PAX3 in a CGI shore (0.3 kb from the CGI) and 1.6 kb down- 
stream of CCDC140, the immediate neighbor of PAX3. The 
second site was in exon 9 of TBXl in the CGI with 79 MbMt- 
hypermethylated sites (8% total CpG; Figs. 2 and 3). These 
genes are weakly {PAX3 and CCDC140) or strongly {TBXl) 
transcribed in Mb with preferential transcription in myogenic 
vs. non-myogenic cells. We tested four to six biological repli- 
cates of normal-control Mb, FSHD Mb, normal skeletal muscle, 
heart, leukocyte and cerebellum, almost all of which had not 
been included in the RRBS analysis. We found that essentially 
all the modification of C residues for both sites was 5-mC, rather 
than 5-hmC, in Mb samples (Fig. 4). For the /MJO-upstream, 
CCD C/4^0-downstream site, the 5-mC status is consistent with 
the cell type-specific enrichment of FI3K9me3 signal throughout 
PAX3/CCDC140 region in Mb (Fig. S2A). Unlike the Mb sam- 
ples, the skeletal muscle samples were enriched in 5-hmC at both 
the TBXl and PAX3 sites relative to all other samples, including 
cerebellum (t-test, p < 0.001). Therefore, these genomic regions 
reflect the higher global 5-hmC content in skeletal muscle vs. 
Mb or Mt, as described above. Remarkably, in a given sample, 
the ratios of 5-hmC to 5-mC were similar for the tested PAX3 
upstream site and the TBXl exon 9 site even though these ratios 



differed much among the skeletal muscle samples. At a MbMt- 
hypermethylated site in intron 3 of PAX3, similar results for 
Mb and skeletal muscle vs. non-muscle samples were found in a 
smaller set of samples (2 each; data not shown) . 

The above enzyme assays also confirmed the RRBS findings 
of hypermethylation in the muscle lineage. We chose the PAX3- 
upstream site for testing because, according to the RRBS data, 
it was not only a MbMt-hypermethylated site but also one of the 
17 sites in this region that was significantly hypomethylated 
(most sites at p < lO'*") in FSHD MbMt vs. analogous control 
MbMt (Fig. 2). The above-mentioned PAX3 intronic site also 
displayed FSHD hypomethylation in RRBS data set. However, 
the results from the enzyme-PCR assay at the /MX3-upstream 
site did not give statistically significant differences at the p < 0.05 
level (t-test, p = 0.07) between FSHD and control Mb samples 
(Fig. 4). Similarly, the site in PAX3 intron 3 that was examined 
in three FSHD and two normal-control Mb samples did not vali- 
date the RRBS-indicated difference between FSHD samples and 
controls (data not shown). Therefore, our finding of a modest 
number of RRBS sites with significant FSHD vs. control dif- 
ferential methylation by RRBS might just be due to individual 
variation and small sample sizes in the FSHD vs. control compar- 
ison. This would be consistent with the FSHD DM sites lacking 
strong associations with functional groups other than homeobox 
genes and lacking clustering of most sites unlike MbMt vs. non- 
muscle DM sites and muscle vs. non-muscle DM sites 
(Fig. IC and D; Table S2A). 
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Myogenic hypomethylation profiles: 
MYH7B/MIR499 and OBSCN. To eluci- 
date the functionality of gene-body hypo- 
methylation in the muscle lineage, the 
DNA and chromatin epigenetics of OBSCN 
(> 80 exons) and MYH7B (45 exons) were 
examined. These genes had 72 and 20 
intragenic muscle-hypomethylated CpG 
sites and 2 and 12 MbMt-hypomethylated 
CpGs, respectively (Figs. 5 and 6). Both 
encode proteins that are found mostly in 
skeletal muscle and heart and, at lower lev- 
els, in parts of the brain and certain other 
tissues.^3,37.3s,45,46 OBSCN protdn prod- 
uct is involved in sarcomere structure and 
important in the skeletal and heart muscle 
lineages.^' In contrast, MYH7B is impor- 
tant in skeletal and cardiac muscle mainly 
because of its microRNA, MIR499, which 
is released from an intron (human intron 
22), rather than because of its protein prod- 
uct. ^'■''^ Although we did not include any 
muscle samples other than skeletal muscle 
in our statistical analyses of myogenesis- 
associated DM sites, we also used heart 
DNA for RRBS profiling of DNA meth- 
ylation. Many of the MYH7B sites that 
displayed tissue-associated hypomethyl- 
ation in skeletal muscle also had little or 
no methylation in heart DNA; however, 
in OBSCN there was less sharing of hypo- 
methylated sites between these two types of 
muscle (Figs. 5 and 6; Figs S4 and S5). In 
addition, for OBSCN, MYH7B, PAX3 and 
TBXl, MbMt DM sites usually showed a 
similar methylation status in IBM Mb as in 
the normal-control Mb samples (Figs. 2, 5 
and 6; Fig. S3A). 

The many muscle-hypomethylated 
sites in OBSCN and MYH7B were mostly 
in internal exons and introns. The ratios 
of DM sites in internal exons vs. introns 
were 2.5 and 3.0 for OBSCN unA MYH7B, 
respectively. This is considerably higher 
than the respective ratios of 1.0 and 1.6 for 
all RRBS-detected CpGs in these two genes 
and 0.36 for all RRBS-detected CpGs in all 
genes associated with muscle-lineage DM 
sites (Table S2C). These results suggest that 
the generation of alternate RNA isoforms 
for both genes^''^^ '"' might be controlled, 
in part, by differential DNA methylation. 
NEB and TTN, two other genes that encode giant sarcomeric 
proteins, also displayed skeletal muscle-associated DNA hypo- 
methylation preferentially in internal exons (5 and 8 myogenic 
DM sites, respectively; Table S3). 
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Figure 5. Myogenic hypomethylation in the OBSCN, which encodes a giant muscle-associated 
protein. DNA and chromatin epigenetic marl<s for OBSCN (chrl; 228,378,622-228,575,138; includ- 
ing ClorfHS) are indicated as in the previous figures with the addition of Mt and Mb data for 
histone H3 methylation and acetylation (Broad, http://genome.ucsc.edu) shown in bar format. 
At the bottom of the figure is an enlarged view of skeletal muscle-associated hypomethylation 
at the exon 8/intron 8 border (chrl : 228,404,946-228,405,003) for the following samples: IBM 
Mb, Ctl Mb3, Ctl Mt3, Ctl Mb7, Ctl Mt7, Fibi, HMEC, ESC, Muscle A1, Muscle A2, Muscle B1, Muscle 
B2, leukocyte, kidney, stomach and testis. Cardiac muscle, which was not included in the set of 
samples for determination of muscle-specific DM sites, is also shown. Myogenesis-associated 
epigenetic marks that are referenced in the text are indicated by boxes or triangles. 



The only two OBSCN MbMt-hypomethylated sites were 
13 bp apart at the 3' end of the gene (Fig. 5, orange box). Their 
retention in skeletal muscle signifies their probable impor- 
tance. These sites and two of the small intragenic clusters of 
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Figure 6. Myogenesis hypomethylated sites in MYH7B/MIR499 over an exon whose 
inclusion triggers nonsense-mediated decay of mRNA. DNA and chromatin epigenetic 
marks are indicated as in the previous figures for half of the exons of MYH7B (exons 1-23) 
and the 5' end of GSS (chr20: 33,542,729-33,578,756). An expanded view of exons 10-18 
(chr20: 33,570,058-33,575,875) is shown at the bottom of the figure. The representative 
cell strains and tissues illustrated are as follows: IBM Mb, Ctl Mb3, Ctl Mt3, Ctl Mb7, Ctl 
Mt7, LCL, Fibi, HIVIEC, ESC, Muscle A1, Muscle A2, Muscle B1, Muscle 82, leukocyte, kidney, 
brain, stomach and cardiac muscle. DHS and CTCF peaks that were specific to Mb and Mt 
are boxed in purple. 



muscle-hypomethylated sites (Fig. 5, purple triangles) resided in 
Mt chromatin that was enhancer-hke in its enrichment, albeit 
low, in H3K4mel and H3K27ac. The increase in these enhancer- 
like H3 modifications in Mt relative to Mb paralleled the much 
higher, but still modest, level of expression of OBSCN in Mt vs. 
Mb as determined from our expression microarray profiling. 
Many of the DM sites in this gene overlapped exons or were near 
putative alternative TSS (UCSC genes track in Fig. S4). At the 
Mb stage, an assortment of variant RNA isoforms were seen, but 



not the major muscle-associated OBSCN isoform 
a or b transcripts (Table S5). 

Muscle-lineage DNA hypermethylation was 
observed at only two CpG sites, 11 bp apart, and 
only in Mb and Mt. These were located at the 
border of an OBSCN CGI overlapping exons 2—4 
(Fig. 5, blue triangle). These sites were immedi- 
ately upstream of the 5' end of novel antisense tran- 
scripts {Clorfl45 and AK056556), whose 3' ends 
are in OBSCN's promoter region (Fig. S4). The 
MbMt-hypermethylated sites were surrounded by 
clusters of muscle-hypomethylated sites (Fig. 5, 
bottom) that were indicative of DNA demethyl- 
ation occurring after the Mt stage. OBSCN pro- 
tein is implicated in the assembly and stabilization 
myofibrils^' and is most abundant in the muscle 
lineage, especially in skeletal muscle tissue.'^''* 
When hypomethylated in muscle, this subregion 
of the OBSCN gene might up regulate expression of 
the antisense transcripts and thereby help increase 
expression of OBSCN after formation of myofi- 
brils, which form subsequent to the Mt stage. 

MYH7B displayed remarkably specific skeletal 
and cardiac muscle hypomethylation in exon 10 
and exons 17—18. These DM sites surround small 
Mb- and Mt-specific DHS and a CTCF binding 
site mostly associated with Mb and Mt (Fig. 6, 
purple boxes; Fig. 55). In skeletal muscle (and 
heart) there was an apparent spreading of MbMt- 
hypomethylation in exons 17 and 18 to the intron 
between them and to an additional site in exon 18. 
Importantly, exon 10 had seven CpGs that were 
hypomethylated in skeletal muscle (and heart) 
but highly methylated in all Mb and Mt samples 
and all non-muscle tissue samples. This unusual 
exon is largely excluded from the mature mRNA 
in C2C12 Mb and Mt, cardiac muscle, and many 
types of skeletal muscle but it is often included in 
brain. ^' Its exclusion leads to the creation of a pre- 
mature nonsense codon and nonsense-mediated 
decay (NMD) of the mRNA in the cytoplasm.^' 
We found that MYH7B RNA was present at too 
low steady-state levels in Mb for accurate analy- 
sis of RNA isoforms from RNA-seq data. This 
probably reflects much NMD of the transcript 
in human Mb and Mt, as previously observed for 
C2C12 cells. Nonetheless, there were increases in 
steady-state RNA levels for this gene in human (our microarray 
expression data) and C2C12-' Mt vs. Mb. Our results suggest 
that differential methylation of exon 10 helps guide alternative 
splicing of this gene. 

Discussion 



Our RRBS profiling of DNA methylation in Mb, Mt, skeletal mus- 
cle, and non-muscle samples indicates that myogenesis-associated 
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changes in DNA methylation before the Mb stage involve similar 
amounts of demethylation and de novo methylation but, after 
the Mt stage, predominantly demethylation. Skeletal muscle lost 
> 90% of the MbMt-hypermethylated sites and acquired many 
additional hypomethylated sites. Much of this loss of hypermeth- 
ylation was in genes encoding sequence-specific transcription 
factors, especially homeobox and T-box genes. There was highly 
significant enrichment in muscle-associated genes among muscle- 
hypomethylated genes. Nonetheless, only about 25% of the 
genes associated with muscle hypomethylation were specifically 
expressed in muscle or linked to a muscle phenotype. In addi- 
tion, there were no simple overall relationships between MbMt- 
hypo- or hyper-methylation and Mb- or Mt-associated expres- 
sion, only for certain subsets of these genes, like contractile fiber 
genes. 

Because the majority of nuclei in skeletal muscle tissue sam- 
ples are postmitotic, the decreased DNA methylation in skeletal 
muscle vs. Mb and Mt cannot be due to passive demethylation, 
which relies on replicative DNA synthesis.'" Instead, our results 
offer the first evidence there is active, genome-wide demethylation 
at CpGs in skeletal muscle. Implicated in pathways for active 
demethylation of mammalian DNA are TETl, TET2, and TET3 
enzymes. They catalyze the formation of 5-hmC as well as 5- 
formylcytosine (5-fC) and 5-carboxymethylcytosine (5-caC) 
from S-hmC.^''"'" Evidence for several DNA-demethylating path- 
ways downstream of the TET-catalyzed formation of 5-hmC, 
5-fC or 5-caC has been reported although the details remain to 
be definitively established.*'''^ ''' Surprisingly, we found that Mb 
and Mt samples have much higher levels of TET2 RNA than any 
of ten analyzed non-muscle cell strains and more TETl RNA, 
with the expected exception'' of embryonal stem cells. Skeletal 
muscle also has a high level of TETl RNA relative to many tissue 
types,'' and TETl was expressed in only about half of the tested 
tissue types, including skeletal muscle." TET-catalyzed hydroxy- 
methylation of 5-mC can also lead to passive DNA demethylation 
by interfering specifically with DNMTl maintenance methyla- 
tion.'^ TETl- or TET2-faciliated passive demethylation might 
occur at a pre-myoblast, cell cycling stage to help generate the 
Mb- and Mt-associated DNA hypomethylation that we observed. 

Using an enzymatic assay for global genomic 5-hmC, we 
showed that overall levels of this 5-mC-derived base were twice as 
high in skeletal muscle as in Mb or Mt samples but similar in Mb, 
Mt, cultured skin fibroblasts, and early-passage chorionic villus 
cells and amniocytes. These results indicate that some of the 
hydroxylation of genomic 5-mC catalyzed by TET enzymes in 
the muscle lineage leads to increases in stably maintained 5-hmC 
DNA residues after the Mt stage. With another enzymatic assay 
on various samples, we demonstrated that considerable amounts 
of 5-hmC were detected only in skeletal muscle tissue at three 
tested CpG sites in upstream, intronic, or 3'-terminal clusters of 
CpGs exhibiting MbMt hypermethylation. In contrast, Mb had 
5-mC as essentially the only modified C residue at these MbMt- 
hypermethylated sites. The genes tested were PAX3 andTBXl, 
homeobox and T-box genes, respectively. They are preferentially 
expressed in myogenic progenitor cells, have important roles in 
myogenesis and other developmental programs,'^'"''" and, like 



many other homeobox and T-box transcription factor genes, 
were enriched in MbMt-hypermethylated sites. At the tissue 
level, skeletal muscle has elevated levels of TBXl RNA and of 
one PAX3 RNA isoform^*^'^^'"* vs. most non-muscle tissues and 
retained some of the MbMt-hypermethylation (Figs. 2 and 3). 

We propose that myogenesis-associated 5-mC and 5-hmC 
CpG sites help positively or negatively regulate transcription or 
co-transcriptional posttranscriptional RNA processing depend- 
ing on the sequence and developmental context. Evidence for 
5-hmC as well as 5-mC performing such roles has recently been 
reported for non-myogenic DM sites. '''"'""^^ Cell type-specific 
epigenetic marks, including enhancer-type chromatin modifi- 
cations, are often inside the body of genes. '''•'''' We found that 
about 35-45% of all RRBS-detected MbMt- or muscle-DM 
sites were in internal exons or introns. One of the four genes that 
we highlighted was OBSCN, which encodes a giant structural 
and signaling protein associated with the formation of myofi- 
brils subsequent to the Mt stage. It had 72 intragenic muscle- 
hypomethylated sites distributed over many of its > 80 internal 
exons and introns. Most of the OBSCN exons encode modular 
protein domains with compatible splice donor and acceptor sites 
enabling complex splicing patterns to produce functional protein 
isoforms. Therefore, modulation of alternative splicing by differ- 
ential gene methylation could lead to differentiation-related differ- 
ences in the resulting proteins. This is consistent with Mb having 
OBSCN RNA isoforms that differed from those associated with 
skeletal muscle and Mb lacking most of the O-S^CA'^DNA hypo- 
methylation seen in skeletal muscle. At two intragenic 
OBSCN subregions that displayed muscle- but not MbMt- 
hypomethylation, Mt samples had chromatin epigenetic features 
that were typical of weak enhancers. This indicates enhancer- 
like chromatin epigenetic changes preceding post-mitotic DNA 
demethylation for this gene. 

We found evidence for cis-regulation of transcription of one 
gene by a differentially methylated internal regulatory element 
of a neighboring gene. PAX3 encodes an essential homeobox/ 
paired-box protein that contributes to early striated muscle devel- 
opment and partly overlaps 5'-to-5' the CCDC140 gene. CCDC140 
is a primate-specific gene that encodes a protein of unknown 
function. Mb, Mt, and skeletal muscle had clusters of myo- 
genic hypermethylation (including 5-hmC in skeletal muscle) 
in intron 3 of PAX3 and also in the single CCDC140 intron 
and downstream of CCDC140 (Fig. 2). The myogenic DM 
sites in the CCDC140 intron and downstream of the gene are 
-1—1.5 kb from neural or hypaxial somite enhancers of mouse 
Pax3?^'^~ They are located in sequences that are highly conserved 
between mice and humans despite the lack of a CCDC140 gene in 
mice. Such conservation is also seen in the region of MbMt hyper- 
methylation in PAX3 intron 3. These MbMt-hypermethylated sites 
and the less abundant muscle-hypermethylated sites in CCDC140 
and PAX3 introns might prevent leaky activation of development- 
specific non-muscle enhancers of PAX3 in the muscle lineage. 

TBXl offers an especially intriguing example of myogenic 
differential methylation in a gene of major clinical importance. 
Haploinsufficiency of TBXl is responsible for most of the symp- 
toms of the DiGeorge/velo-cardio-facial syndrome, the most 



www.landesbioscience.com 



Epigenetics 



327 



common microdeletion syndrome in humans.''''^' Symptoms 
include congenital heart and craniofacial abnormalities, skeletal 
muscle hypotonia, and increased risk of psychiatric disorders. 
Variations in expression levels in vivo may be relevant to the 
disparities in genotype-phenotype correlations/'' In addition to 
its role in the above syndrome, TBXl has been separately impli- 
cated in behavioral disorders/^ Control of transcription of this 
gene and posttranscriptional regulation''^ have to be exquisitely 
regulated because both haploinsufficiency and overexpression are 
deleterious, and this gene is differentially expressed in different 
cell types and tissues. TBXl had an extraordinarily high number 
(127) of intragenic MbMt-hypermethylated sites as well as one 
MbMt- and muscle-hypomethylated site and three MbMt- and 
muscle-hypermethylated sites in the 3-kb upstream region. From 
exon 7 to exon 9 of the TBXl isoform c, there is a 1.7-kb CGI 
that contains 79 of the MbMt hypermethylated sites and a mod- 
erate CpG content (8%). Moderate CpG contents in CGIs favor 
multiple 5-hmC residues,"^* and indeed, skeletal muscle, though 
not Mb or other tested tissue types, had considerable 5-hmC at 
the assayed site in this region (Fig. 4) . 

There are several indications of the possible functional sig- 
nificance of the dense myogenic hypermethylation of TBXl's 
large intragenic CGI and its relationship to preferential expres- 
sion of TBXl in myogenic cells. This gene's 5-mC or 5-hmC 
residues in the muscle lineage might influence retention of the 
overlapping cassette exon, which is the terminal exon of TBXlc 
RNA, the predominant RNA isoform in Mb (Table S5). In Mb 
and Mt, this CGI has a distinct chromatin structure lacking the 
H3K36me3 signal of the surrounding sequences which, along 
with DNA methylation, might help regulate whether exon 
9 is included in the mRNA or not.*"' This CGI also overlaps 
chromatin with enhancer- and promoter-type histone modifi- 
cations in HMEC but not in Mb and Mt samples (Fig. S3). 
Therefore, its hypermethylation in myogenic cells might down- 
modulate the activity of an enhancer specific for certain other 
cell types. Lastly, centered over intron 3 at -2 kb upstream of 
this CGI, there was a large peak of H3K4me3 specific to Mb, 
Mt, and HUVEC as well as hypomethylated CpG sites in muscle 
(Fig. 3). Surprisingly, these cells contained no other I-I3K4me3 
peak in the TBXl vicinity even though H3K4me3 is gener- 
ally associated with active promoters, and this gene was very 
highly expressed in myogenic progenitor cells and moderately 
expressed in HUVEC to give predominantly a full-length RNA 
starting at the canonical promoter. Because H3K4me3 peaks 
sometimes also indicate active enhancers, this H3K4me3 peak 
with the dense clusters of hypermethylation several kb to either 
side may be a potent tissue-specific enhancer. It could thereby 
reverse a pharyngeal arch-associated silencer element detected 
in the largely orthologous murine intron2-to-intron3 region 
of Thxl.^^ The complexity of the cis-regulatory elements in 
TBXl'^'' is consistent with the complexity that we observed at 
the epigenetic level and is probably necessary to precisely regu- 
late the gene's expression in a cell type- and developmentally- 
specific way. TBXl not only plays a critical role in regulating 
the number of myocytes in the developing limb"'' but its dif- 
ferential regulation also extends to muscle type (quadriceps. 



soleus and gastrocnemius; in order of relative expression).^" 
Therefore, some of the variability seen in 5-hmC contents of 
different skeletal muscle samples at the tested exon-9 site in the 
present study might be explained by a variety of types of skeletal 
muscle in our samples. The exact muscle type was unknown 
for these autopsy samples. In contrast to TBXl's importance in 
both developing and fully formed skeletal muscle tissue, it is 
critical for proper heart formation^' but has little or no expres- 
sion in mouse heart.'** Correspondingly, we observed that most 
of the hypermethylated and hypomethylated CpGs in skeletal 
muscle within or immediately upstream of TBXl did not show 
a similar methylation status in heart. 

MbMt- and skeletal muscle-hypomethylation was seen at 
many sites within the MYH7B/MIR499 gene. Heart shared a 
low methylation status at most of these sites with skeletal muscle 
(Fig. 6). MIR499, the miRNA, which is generated after exci- 
sion from MYH7B intron 22, helps specify slow vs. fast skeletal 
muscle fiber type and is implicated in myogenesis, cardiac muscle 
formation, and cardiac stress-response. ^'•^^ ''^ MIR499 is present 
at very different steady-state levels in different types of skeletal 
muscle and in heart; the relative order of expression is ventricle, 
soleus, gastrocnemius, and quadriceps. ^'■''^ The MYH7B/MIR499 
gene is very unusual because although it is most strongly asso- 
ciated with skeletal and cardiac muscle, its predominant RNA 
isoform in those tissues skips MYH7B exon 10 and thereby gener- 
ates only the miRNA. The full-length mRNA with the retained 
exon 10 that can encode the MYH7B protein is produced in very 
low amounts in these tissues. The lack of protein product from 
the exon 10-skipped RNA isoform is due to a shift in reading 
frame and NMD upon exclusion of exon 10, which causes the 
destruction of the transcript in the cytoplasm without affecting 
the levels of the intronic-encoded miRNA.^''^^'''^ 

The observed hypomethylation in alternative exon 10 of 
MYH7B in skeletal and cardiac muscle, but not in Mb or Mt, as 
well as the hypomethylation in exons 17 and 18, might increase 
the efficiency of skipping of exonlO in a tissue- and develop- 
mental-specific manner. Brain, which was fully methylated in 
these regions, has much more inclusion of cassette exon 10 in 
its mRNA.-' The increased ratio of MYH7B protein to MIR499 
RNA in brain may be linked to MYH7B protein being involved 
in the regulation of brain synapses.'*'' The DNA hypomethylation, 
the myogenesis-associated CTCF binding site, and the small DHS 
between exons 10 and 17 might contribute to this tissue-specific 
splicing by affecting the rate of transcription elongation.''' In addi- 
tion to association with differentially spliced exons, MbMt-specific 
hypomethylation in MYH7B probably helps control the levels of 
transcription. For example, 1 kb downstream of the TSS there was 
muscle-lineage associated DNA hypomethylation. 

Therefore, our study suggests that differential methylation 
of many genes that are important for forming skeletal muscle, 
like MYH7B/MIR499, TBXl and OBSCN, is involved in both 
regulation of expression levels and cell type-specific process- 
ing of pre-mRNA. Consistent with this conclusion, differential 
DNA methylation has previously been implicated in myogenesis 
from studies of several specific genes and from treating mouse 
myoblasts with 5-azacytidine, a DNA methylation inhibitor.^^'^* 
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Expression of some genes, e.g., MYH7B and TBXl, varies with 
muscle fiber-type. Therefore, DNA methylation may also help 
regulate metabolic and structural changes in skeletal muscle in 
response to physiological alterations, e.g., exercise,^' given the espe- 
cially dynamic nature of this tissue. 

Materials and Methods 

Samples. Mb cell strains used for methylation analysis were prop- 
agated from muscle biopsy samples of FSHD and control individ- 
uals (Table SI)." Mt samples were obtained from these myoblast 
cell strains by serum limitation for 5 d. By immunostaining," 
we demonstrated that all batches of myoblasts contained > 90% 
desmin-positive cells and that myotube preparations consisted of 
> 70% of the nuclei in multinucleated cells desmin-positive and 
myosin heavy chain-positive cells. FSHD myoblasts were derived 
from clinically and molecularly diagnosed patients with disease- 
associated shortening of the D4Z4 repeat array. The other cell 
cultures and tissue samples for DNA methylation profiling are 
also shown in Table SI. 

DNA methylation profiling. High-molecular weight DNA 
was extracted from Mb and Mt samples and used for reduced 
representation bisulfite sequence (RRBS) for genome-wide DNA 
methylation profiling. RRBS was done using Mspl to digest 1-jJtg 
aliquots of DNA before bisulfite treatment and next-generation 
sequencing on an lUumina platform.'^'^' RRBS data are avail- 
able from the UCSC genome browser (http://genome.ucsc.edu/ 
cgi-bin/hgTrackUi?hgsid=292099017&c=chrl&g=wgEncode 
HaibMethylRrbs) . In addition, for a few genes, DNA methyla- 
tion profiles from the Infinium Human Methylation 450 plat- 
form (http : / / genome.ucsc.edu/cgi-bin/hgTrackUi ?hgsid=29209 
96l9&c=chr20&g=wgEncodeHaibMethyl450), which also uses 
bisulfite-treated DNA, were used to supplement the RRBS pro- 
files to qualitatively evaluate correlations between DNA methyla- 
tion and gene expression. 

Statistical analyses of DNA methylation. All statistical anal- 
yses were performed using R version 2.147^ BED files containing 
DNA methylation data for each sample from the UCSC genome 
browser (http://genome.ucsc.edu) were aggregated into matrices 
with rows included for each site detected in any sample, with 
one matrix for tissues and another for cell lines. Both matrices 
included approximately 2 million sites spanning the 24 chromo- 
somes, and these were stored and accessed using the bigmemory 
package.^^ To assess differential methylation in each of our com- 
parisons, a binomial regression model was fit for each site for which 
read counts and methylation levels were present for a representa- 
tive number of samples in each group. Where appropriate, random 
effects terms were included to adjust for systematic variation due 
to tissue subtypes and/or technical replication using the nlme4 
package.^' The significance of the methylation change at each site 
was assessed using the p value for the group effect in each com- 
parison. To increase the specificity of our analyses, we restricted 
our attention to those sites for which a change in methylation per- 
centage of at least 50% was observed at a significance level of 0.01 
or below, and these DM sites were output in the BED file format 
for further bioinformatics analysis. 



The program closestBed*" was employed to determine the 
nearest gene for each DM site using a subset of the contents 
of the UCSC refFlat table (http://genome.ucsc.edu/cgi-bin/ 
hgTables, 5/01/12) including only one isoform per gene. We gave 
precedence first to those isoforms chosen for best fit in our study 
of exon-based microarray gene expression'^^ (Affymetrix) of Mb 
and Mt and then to those isoforms with the RefSeq accession 
id "NM*" for protein-encoding genes. In the event that closest- 
Bed reported multiple genes overlapping a given site, tie-breaking 
rules were employed to select a single gene on the basis of criteria 
such as proximity of the 5' end to the DM site. In other cases, 
the first gene encountered in genomic order among multiple 
equidistant genes not overlapping the given DM site was cho- 
sen as the tiebreaker. Thus having determined one output gene 
isoform record for each input DM site, we composed an R script 
to count the frequency of occurrence of both hyper- and hypo- 
methylated DM sites in each of the following genomic subre- 
gions: 2 kb upstream of TSS, a singleton exon, first non-singleton 
exon and intron, other internal exons and introns, last exon plus 
2 kb downstream, and intergenic regions excluding 2 kb upstream 
of the TSS or downstream of the TTS. 

DNasel hypersensitivity mapping. Genome-wide mapping of 
DNasel-hypersensitive sites (DHS) was done on two of the same 
batches of control myoblast and myotube samples that were used 
for RRBS (Table SI). The other two Mb and Mt pairs of samples 
were from two additional myoblast cell strains and were of simi- 
larly high quality when checked by immunostaining as described 
above. For DHS profiling, intact nuclei were treated with DNasel 
and the DNasel-hypersensitive fraction was analyzed by next- 
generation sequencing as previously described.**' Data from these 
three biological replicates were combined for the DHS tracks in 
the UCSC genome browser's ENCODE data (genome.ucsc.edu/ 
cgi-bin/hgTrackUi?hgsid=2920996l9&c=chr20&g=wgEncod 
eOpenChromDnase). DHS profiling of the non-myogenic cell 
cultures had been previously reported. ^''^^ 

Histone modification, CTCF binding, and transcription 
analyses. Data sets and sample information for histone modifi- 
cations and CTCF binding; non-strand-specific RNA-seq; and 
strand-specific RNA-seq [Long RNA-seq, > 200 nt poly(A)* 
RNA] were from the ENCODE project (http://genomes.ucsc. 
edu) via the laboratories of Bradley Bernstein (Broad Institute), 
Barbara Wold (California Institute of Technology), and Tom 
Gingeras (Cold Spring Harbor), respectively. RNA isoform anal- 
ysis and quantification was done with the CuffDiff tooP'*'^' using 
the above-mentioned ENCODE non-strand-specific RNA-seq 
database. Mb and Mt samples for ENCODE histone modifica- 
tion and CTCF profiling and Mb for RNA-seq had been com- 
mercially obtained, and no immunostaining for determination of 
percentage contamination with other cell types was described. In 
addition, we used previously unreported data from our expres- 
sion profiling of Mb and Mt vs. 19 types of non-muscle cell cul- 
tures on microarrays (Affymetrix Exon 1.0 ST'^). 

Determination of 5-hydroxymethylcytosine levels in 
genomic DNA and at specific CCGG sites. For the determina- 
tion of the levels of genomic 5-hmC, a T4 |3-glucosyltransferase 
(|3-GT; New England Biolabs) glucosylation assay was performed 
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on 2 |xg of purified genomic DNA incubated with UDP-PH] 
glucose and (3-GT and applied to a DE81 membrane with deter- 
mination of the membrane-bound radioactivity, as previously 
described."' All glucosylation reaction values were corrected 
for non-specific binding of UDP- PH] glucose to membranes 
using a DNA-minus reaction. Using a standard curve made 
from genomic T^-gt DNA (New England BioLabs), the fmol of 
5-hmC per microgram of genomic DNA was calculated and then 
converted to 5-hmC mol% of total genomic bases as previously 
described."' For analysis of levels of 5-hmC and 5-mC at a given 
Mspl site, we used another assay involving (i-GT (Epimark, New 
England Biolabs). DNA samples were treated with (B-GT or used 
for mock reactions, followed by digestion with Mspl, Hpall, or 
mock digestion, and then real-time PGR and assessment of meth- 
ylation status calculated by subtraction of Gt values according to 
the manufacturer's instructions. The forward and reverse prim- 
ers for PGR were as follows: TBXl, 5'-GTG GGA TGG AGG 
GAG GAT T-3' and 5'-GGG AAA GGT GGG GTG AAG-3'; 
PAX3 intron 3, 5'-GGT GAG AGT GTG GGA GGA AA-3' and 
5'-TTT GGT AAG GGG AAG GGA A-3'; and CCDC140 down- 
stream//MZ3 upstream, 5'-AAG TTG AGT GGA GAT TGG 
GGT TA-3' and 5'-AGA GGT GGT GGG AGA GGA-3'. 
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